Auto-tuned nested parallelism: A way to reduce the execution time of scientific software in NUMA systems
نویسندگان
چکیده
Scientific and engineering problems are solved with large parallel systems In some cases those systems are NUMA A large number of cores Share a hierarchically organized memory Kernel of the computation for those problems: BLAS o similar Efficient use of kernels a faster solution of a large range of scientific problems Auto Auto-tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems PPAM 2012. Efficient use of kernels a faster solution of a large range of scientific problems Normally: multithreaded BLAS library optimized for the system is used, but: If the number of cores increases the degradation in the performance grows In this work: Analysis of the behaviour in NUMA of an example of high-level routine: a LU factorisation An improved scheme: [ multithreaded dgemm of BLAS + OpenMP ] nested parallelism An auto-tuning method a reduction in the execution time Outline Outline Introduction Computational systems The software Motivation Auto Auto-tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems PPAM 2012.
منابع مشابه
Multigrain Parallelization for Model-Based Design Applications Using the OSCAR Compiler
Model-based design is a very popular software development method for developing a wide variety of embedded applications such as automotive systems, aircraft systems, and medical systems. Model-based design tools like MATLAB/Simulink typically allow engineers to graphically build models consisting of connected blocks for the purpose of reducing development time. These tools also support automati...
متن کاملTiling and Scheduling of Three-level Perfectly Nested Loops with Dependencies on Heterogeneous Systems
Nested loops are one of the most time-consuming parts and the largest sources of parallelism in many scientific applications. In this paper, we address the problem of 3-dimensional tiling and scheduling of three-level perfectly nested loops with dependencies on heterogeneous systems. To exploit the parallelism, we tile and schedule nested loops with dependencies by awareness of computational po...
متن کاملParallélisme des nids de boucles pour l'optimisation du temps d'exécution et de la taille du code. (Nested loop parallelism to optimize execution time and code size)
The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instructio...
متن کاملA Tool Environment for Efficient Execution of Shared Memory Programs on NUMA Systems
One of the most important performance issues on NUMA systems is data locality since remote memory accesses have latencies several magnitudes higher than local memory accesses. This paper presents a tool environment targeting at tuning NUMA-based shared memory applications towards better memory locality. This tool environment comprises tools, supporting system facilities, and their interface. To...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 40 شماره
صفحات -
تاریخ انتشار 2014